Automatic Lexical Acquisition Based on Statistical Distributions
نویسندگان
چکیده
We automatically classify verbs into lexical semantic classes, based on distributions of indicators of verb alternations, extracted from a very large annotated corpus. We address a problem which is particularly di cult because the verb classes, although semantically di erent, show similar surface syntactic behavior. Five grammatical features are su cient to reduce error rate by more than 50% over chance: we achieve almost 70% accuracy in a task whose baseline performance is 34%, and whose expert-based upper bound we calculated at 86.5%. We conclude that corpus-driven extraction of grammatical features is a promising methodology for ne-grained verb classi cation.
منابع مشابه
The Effect of Interaction on Lexical Acquisition
This research showed that appropriate input and suitable contexts for interaction among students can lead to successful second language acquisition (SLA). This study based on Swain's (2005) notion of collaborative dialogue, aimed to study whether EFL learners participating in negotiation of meaning based tasks collaborate with each other and, if so, to investigate the role of this behavior in ...
متن کاملAutomatic Verb Classification Using Distributions of Grammatical Features
We apply machine learning techniques to classify automatically a set of verbs into lexical semantic classes, based on distributional approximations of diathe-ses, extracted from a very large annotated corpus. Distributions of four grammatical features are sufficient to reduce error rate by 50% over chance. We conclude that corpus data is a usable repository of verb class information, and that c...
متن کاملAutomatic Verb Classiication Using Distributions of Grammatical Features
We apply machine learning techniques to classify automatically a set of verbs into lexical semantic classes, based on distributional approximations of diathe-ses, extracted from a very large annotated corpus. Distributions of four grammatical features are suucient to reduce error rate by 50% over chance. We conclude that corpus data is a usable repository of verb class information, and that cor...
متن کاملAutomatic Verb Classification Based on Statistical Distributions of Argument Structure
Automatic acquisition of lexical knowledge is critical to a wide range of natural language processing tasks. Especially important is knowledge about verbs, which are the primary source of relational information in a sentence--the predicate-argument structure that relates an action or state to its participants (i.e., who did what to whom). In this work, we report on supervised learning experimen...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کامل